Skip to content

: clock: bind undeliverable port in sim clock's mailbox #517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

shayne-fletcher
Copy link
Contributor

Summary:
pesky warnings in python about using a mailbox for sending without binding the undeliverable port (see D78031232) turned out to be due to the use of SimClock.

hack in a "fix" that terminates the process if an undeliverable message is encountered in this context.

we'll definitely want to revisit this however and fix it more appropriately (likely by threading through a mailbox from upstack that has the undeliverable port bound and bounces undeliverables into supervision events perhaps).

Differential Revision: D78191921

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 11, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78191921

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78191921

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Jul 12, 2025
Summary:
Pull Request resolved: pytorch-labs#517

(1) pesky warnings in python about using a mailbox for sending without binding the undeliverable port (see D78031232) turned out to be due to the use of `SimClock`.

hack in a "fix" that ~~terminates the process~~ logs the event (the idea of terminating the process didn't work out - test failures[*] ) if an undeliverable message is encountered in this context.

we'll definitely want to revisit this however (kaiyuan-li, thomasywang) and fix it more appropriately (maybe by threading through a mailbox from upstack that has the undeliverable port bound and bounces undeliverables into supervision events perhaps?).

(2) provide `unused_return_handle()` and use it replacing the `/*unused*/ monitored_return_handle()` idiom from the codebase.

 ---

[*] the failed tests being,

(1) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/controller:controller-unittest -- --exact 'monarch/controller:controller-unittest - tests::test_sim_supervision_failure'`,

(2)`buck test 'fbcode//mode/opt' fbcode//monarch/python/tests:test_sim_backend -- --exact 'monarch/python/tests:test_sim_backend - test_sim_backend.py::TestSimBackend::test_local_mesh_setup'`,

(3) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/hyperactor:hyperactor-unittest -- --exact 'monarch/hyperactor:hyperactor-unittest - clock::tests::test_sim_timeout'`

in all cases i *think* the tests themselves are passing and the undeliverable messages are actually encountered during teardown.

Differential Revision: D78191921
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78191921

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Jul 12, 2025
Summary:
Pull Request resolved: pytorch-labs#517

(1) pesky warnings in python about using a mailbox for sending without binding the undeliverable port (see D78031232) turned out to be due to the use of `SimClock`.

hack in a "fix" that ~~terminates the process~~ logs the event (the idea of terminating the process didn't work out - test failures[*] ) if an undeliverable message is encountered in this context.

we'll definitely want to revisit this however (kaiyuan-li, thomasywang) and fix it more appropriately (maybe by threading through a mailbox from upstack that has the undeliverable port bound and bounces undeliverables into supervision events perhaps?).

(2) provide `unused_return_handle()` and use it to replace the `/*unused*/ monitored_return_handle()` idiom from the codebase.

 ---

[*] the failed tests being,

(1) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/controller:controller-unittest -- --exact 'monarch/controller:controller-unittest - tests::test_sim_supervision_failure'`,

(2)`buck test 'fbcode//mode/opt' fbcode//monarch/python/tests:test_sim_backend -- --exact 'monarch/python/tests:test_sim_backend - test_sim_backend.py::TestSimBackend::test_local_mesh_setup'`,

(3) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/hyperactor:hyperactor-unittest -- --exact 'monarch/hyperactor:hyperactor-unittest - clock::tests::test_sim_timeout'`

in all cases i *think* the tests themselves are passing and the undeliverable messages are actually encountered during teardown.

Differential Revision: D78191921
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78191921

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Jul 12, 2025
Summary:
Pull Request resolved: pytorch-labs#517

(1) pesky warnings in python about using a mailbox for sending without binding the undeliverable port (see D78031232) turned out to be due to the use of `SimClock`.

hack in a "fix" that ~~terminates the process~~ logs the event (the idea of terminating the process didn't work out - test failures[*] ) if an undeliverable message is encountered in this context.

we'll definitely want to revisit this however (kaiyuan-li, thomasywang) and fix it more appropriately (maybe by threading through a mailbox from upstack that has the undeliverable port bound and bounces undeliverables into supervision events perhaps?).

(2) provide `unused_return_handle()` and use it to replace the `/*unused*/ monitored_return_handle()` idiom from the codebase.

 ---

[*] the failed tests being,

(1) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/controller:controller-unittest -- --exact 'monarch/controller:controller-unittest - tests::test_sim_supervision_failure'`,

(2)`buck test 'fbcode//mode/opt' fbcode//monarch/python/tests:test_sim_backend -- --exact 'monarch/python/tests:test_sim_backend - test_sim_backend.py::TestSimBackend::test_local_mesh_setup'`,

(3) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/hyperactor:hyperactor-unittest -- --exact 'monarch/hyperactor:hyperactor-unittest - clock::tests::test_sim_timeout'`

in all cases i *think* the tests themselves are passing and the undeliverable messages are actually encountered during teardown.

Differential Revision: D78191921
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78191921

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Jul 12, 2025
Summary:
Pull Request resolved: pytorch-labs#517

(1) pesky warnings in python about using a mailbox for sending without binding the undeliverable port (see D78031232) turned out to be due to the use of `SimClock`.

hack in a "fix" that ~~terminates the process~~ logs the event (the idea of terminating the process didn't work out - test failures[*] ) if an undeliverable message is encountered in this context.

we'll definitely want to revisit this however (kaiyuan-li, thomasywang) and fix it more appropriately (maybe by threading through a mailbox from upstack that has the undeliverable port bound and bounces undeliverables into supervision events perhaps?).

(2) provide `unused_return_handle()` and use it to replace the `/*unused*/ monitored_return_handle()` idiom from the codebase.

 ---

[*] the failed tests being,

(1) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/controller:controller-unittest -- --exact 'monarch/controller:controller-unittest - tests::test_sim_supervision_failure'`,

(2)`buck test 'fbcode//mode/opt' fbcode//monarch/python/tests:test_sim_backend -- --exact 'monarch/python/tests:test_sim_backend - test_sim_backend.py::TestSimBackend::test_local_mesh_setup'`,

(3) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/hyperactor:hyperactor-unittest -- --exact 'monarch/hyperactor:hyperactor-unittest - clock::tests::test_sim_timeout'`

in all cases i *think* the tests themselves are passing and the undeliverable messages are actually encountered during teardown.

Differential Revision: D78191921
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78191921

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Jul 12, 2025
Summary:
Pull Request resolved: pytorch-labs#517

(1) pesky warnings in python about using a mailbox for sending without binding the undeliverable port (see D78031232) turned out to be due to the use of `SimClock`.

hack in a "fix" that ~~terminates the process~~ logs the event (the idea of terminating the process didn't work out - test failures[*] ) if an undeliverable message is encountered in this context.

we'll definitely want to revisit this however (kaiyuan-li, thomasywang) and fix it more appropriately (maybe by threading through a mailbox from upstack that has the undeliverable port bound and bounces undeliverables into supervision events perhaps?).

(2) provide `unused_return_handle()` and use it to replace the `/*unused*/ monitored_return_handle()` idiom from the codebase.

 ---

[*] the failed tests being,

(1) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/controller:controller-unittest -- --exact 'monarch/controller:controller-unittest - tests::test_sim_supervision_failure'`,

(2)`buck test 'fbcode//mode/opt' fbcode//monarch/python/tests:test_sim_backend -- --exact 'monarch/python/tests:test_sim_backend - test_sim_backend.py::TestSimBackend::test_local_mesh_setup'`,

(3) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/hyperactor:hyperactor-unittest -- --exact 'monarch/hyperactor:hyperactor-unittest - clock::tests::test_sim_timeout'`

in all cases i *think* the tests themselves are passing and the undeliverable messages are actually encountered during teardown.

Differential Revision: D78191921
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78191921

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Jul 12, 2025
Summary:
Pull Request resolved: pytorch-labs#517

(1) pesky warnings in python about using a mailbox for sending without binding the undeliverable port (see D78031232) turned out to be due to the use of `SimClock`.

hack in a "fix" that ~~terminates the process~~ logs the event (the idea of terminating the process didn't work out - test failures[*] ) if an undeliverable message is encountered in this context.

we'll definitely want to revisit this however (kaiyuan-li, thomasywang) and fix it more appropriately (maybe by threading through a mailbox from upstack that has the undeliverable port bound and bounces undeliverables into supervision events perhaps?).

(2) provide `unused_return_handle()` and use it to replace the `/*unused*/ monitored_return_handle()` idiom from the codebase.

 ---

[*] the failed tests being,

(1) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/controller:controller-unittest -- --exact 'monarch/controller:controller-unittest - tests::test_sim_supervision_failure'`,

(2)`buck test 'fbcode//mode/opt' fbcode//monarch/python/tests:test_sim_backend -- --exact 'monarch/python/tests:test_sim_backend - test_sim_backend.py::TestSimBackend::test_local_mesh_setup'`,

(3) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/hyperactor:hyperactor-unittest -- --exact 'monarch/hyperactor:hyperactor-unittest - clock::tests::test_sim_timeout'`

in all cases i *think* the tests themselves are passing and the undeliverable messages are actually encountered during teardown.

Differential Revision: D78191921
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78191921

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Jul 12, 2025
Summary:
Pull Request resolved: pytorch-labs#517

(1) pesky warnings in python about using a mailbox for sending without binding the undeliverable port (see D78031232) turned out to be due to the use of `SimClock`.

hack in a "fix" that ~~terminates the process~~ logs the event (the idea of terminating the process didn't work out - test failures[*] ) if an undeliverable message is encountered in this context.

we'll definitely want to revisit this however (kaiyuan-li, thomasywang) and fix it more appropriately (maybe by threading through a mailbox from upstack that has the undeliverable port bound and bounces undeliverables into supervision events perhaps?).

(2) provide `unused_return_handle()` and use it to replace the `/*unused*/ monitored_return_handle()` idiom from the codebase.

 ---

[*] the failed tests being,

(1) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/controller:controller-unittest -- --exact 'monarch/controller:controller-unittest - tests::test_sim_supervision_failure'`,

(2)`buck test 'fbcode//mode/opt' fbcode//monarch/python/tests:test_sim_backend -- --exact 'monarch/python/tests:test_sim_backend - test_sim_backend.py::TestSimBackend::test_local_mesh_setup'`,

(3) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/hyperactor:hyperactor-unittest -- --exact 'monarch/hyperactor:hyperactor-unittest - clock::tests::test_sim_timeout'`

in all cases i *think* the tests themselves are passing and the undeliverable messages are actually encountered during teardown.

Differential Revision: D78191921
shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Jul 12, 2025
Summary:
Pull Request resolved: pytorch-labs#517

(1) pesky warnings in python about using a mailbox for sending without binding the undeliverable port (see D78031232) turned out to be due to the use of `SimClock`.

hack in a "fix" that ~~terminates the process~~ logs the event (the idea of terminating the process didn't work out - test failures[*] ) if an undeliverable message is encountered in this context.

we'll definitely want to revisit this however (kaiyuan-li, thomasywang) and fix it more appropriately (maybe by threading through a mailbox from upstack that has the undeliverable port bound and bounces undeliverables into supervision events perhaps?).

(2) provide `unused_return_handle()` and use it to replace the `/*unused*/ monitored_return_handle()` idiom from the codebase.

 ---

[*] the failed tests being,

(1) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/controller:controller-unittest -- --exact 'monarch/controller:controller-unittest - tests::test_sim_supervision_failure'`,

(2)`buck test 'fbcode//mode/opt' fbcode//monarch/python/tests:test_sim_backend -- --exact 'monarch/python/tests:test_sim_backend - test_sim_backend.py::TestSimBackend::test_local_mesh_setup'`,

(3) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/hyperactor:hyperactor-unittest -- --exact 'monarch/hyperactor:hyperactor-unittest - clock::tests::test_sim_timeout'`

in all cases i *think* the tests themselves are passing and the undeliverable messages are actually encountered during teardown.

Differential Revision: D78191921
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78191921

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Jul 12, 2025
Summary:
Pull Request resolved: pytorch-labs#517

(1) pesky warnings in python about using a mailbox for sending without binding the undeliverable port (see D78031232) turned out to be due to the use of `SimClock`.

hack in a "fix" that ~~terminates the process~~ logs the event (the idea of terminating the process didn't work out - test failures[*] ) if an undeliverable message is encountered in this context.

we'll definitely want to revisit this however (kaiyuan-li, thomasywang) and fix it more appropriately (maybe by threading through a mailbox from upstack that has the undeliverable port bound and bounces undeliverables into supervision events perhaps?).

(2) provide `unused_return_handle()` and use it to replace the `/*unused*/ monitored_return_handle()` idiom from the codebase.

 ---

[*] the failed tests being,

(1) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/controller:controller-unittest -- --exact 'monarch/controller:controller-unittest - tests::test_sim_supervision_failure'`,

(2)`buck test 'fbcode//mode/opt' fbcode//monarch/python/tests:test_sim_backend -- --exact 'monarch/python/tests:test_sim_backend - test_sim_backend.py::TestSimBackend::test_local_mesh_setup'`,

(3) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/hyperactor:hyperactor-unittest -- --exact 'monarch/hyperactor:hyperactor-unittest - clock::tests::test_sim_timeout'`

in all cases i *think* the tests themselves are passing and the undeliverable messages are actually encountered during teardown.

Differential Revision: D78191921
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78191921

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78191921

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78191921

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Jul 12, 2025
Summary:
Pull Request resolved: pytorch-labs#517

(1) pesky warnings in python about using a mailbox for sending without binding the undeliverable port (see D78031232) turned out to be due to the use of `SimClock`.

hack in a "fix" that ~~terminates the process~~ logs the event (the idea of terminating the process didn't work out - test failures[*] ) if an undeliverable message is encountered in this context.

we'll definitely want to revisit this however (kaiyuan-li, thomasywang) and fix it more appropriately (maybe by threading through a mailbox from upstack that has the undeliverable port bound and bounces undeliverables into supervision events perhaps?).

 ---

[*] the failed tests being,

(1) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/controller:controller-unittest -- --exact 'monarch/controller:controller-unittest - tests::test_sim_supervision_failure'`,

(2)`buck test 'fbcode//mode/opt' fbcode//monarch/python/tests:test_sim_backend -- --exact 'monarch/python/tests:test_sim_backend - test_sim_backend.py::TestSimBackend::test_local_mesh_setup'`,

(3) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/hyperactor:hyperactor-unittest -- --exact 'monarch/hyperactor:hyperactor-unittest - clock::tests::test_sim_timeout'`

in all cases i *think* the tests themselves are passing and the undeliverable messages are actually encountered during teardown.

Differential Revision: D78191921
shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Jul 12, 2025
…#517)

Summary:

(1) pesky warnings in python about using a mailbox for sending without binding the undeliverable port (see D78031232) turned out to be due to the use of `SimClock`.

hack in a "fix" that ~~terminates the process~~ logs the event (the idea of terminating the process didn't work out - test failures[*] ) if an undeliverable message is encountered in this context.

we'll definitely want to revisit this however (kaiyuan-li, thomasywang) and fix it more appropriately (maybe by threading through a mailbox from upstack that has the undeliverable port bound and bounces undeliverables into supervision events perhaps?).

---


[*] the failed tests being,

(1) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/controller:controller-unittest -- --exact 'monarch/controller:controller-unittest - tests::test_sim_supervision_failure'`, 

(2)`buck test 'fbcode//mode/opt' fbcode//monarch/python/tests:test_sim_backend -- --exact 'monarch/python/tests:test_sim_backend - test_sim_backend.py::TestSimBackend::test_local_mesh_setup'`,

(3) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/hyperactor:hyperactor-unittest -- --exact 'monarch/hyperactor:hyperactor-unittest - clock::tests::test_sim_timeout'`

in all cases i *think* the tests themselves are passing and the undeliverable messages are actually encountered during teardown.

Differential Revision: D78191921
shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Jul 12, 2025
…#517)

Summary:

(1) pesky warnings in python about using a mailbox for sending without binding the undeliverable port (see D78031232) turned out to be due to the use of `SimClock`.

hack in a "fix" that ~~terminates the process~~ logs the event (the idea of terminating the process didn't work out - test failures[*] ) if an undeliverable message is encountered in this context.

we'll definitely want to revisit this however (kaiyuan-li, thomasywang) and fix it more appropriately (maybe by threading through a mailbox from upstack that has the undeliverable port bound and bounces undeliverables into supervision events perhaps?).

---


[*] the failed tests being,

(1) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/controller:controller-unittest -- --exact 'monarch/controller:controller-unittest - tests::test_sim_supervision_failure'`, 

(2)`buck test 'fbcode//mode/opt' fbcode//monarch/python/tests:test_sim_backend -- --exact 'monarch/python/tests:test_sim_backend - test_sim_backend.py::TestSimBackend::test_local_mesh_setup'`,

(3) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/hyperactor:hyperactor-unittest -- --exact 'monarch/hyperactor:hyperactor-unittest - clock::tests::test_sim_timeout'`

in all cases i *think* the tests themselves are passing and the undeliverable messages are actually encountered during teardown.

Differential Revision: D78191921
shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Jul 12, 2025
…#517)

Summary:

(1) pesky warnings in python about using a mailbox for sending without binding the undeliverable port (see D78031232) turned out to be due to the use of `SimClock`.

hack in a "fix" that ~~terminates the process~~ logs the event (the idea of terminating the process didn't work out - test failures[*] ) if an undeliverable message is encountered in this context.

we'll definitely want to revisit this however (kaiyuan-li, thomasywang) and fix it more appropriately (maybe by threading through a mailbox from upstack that has the undeliverable port bound and bounces undeliverables into supervision events perhaps?).

---


[*] the failed tests being,

(1) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/controller:controller-unittest -- --exact 'monarch/controller:controller-unittest - tests::test_sim_supervision_failure'`, 

(2)`buck test 'fbcode//mode/opt' fbcode//monarch/python/tests:test_sim_backend -- --exact 'monarch/python/tests:test_sim_backend - test_sim_backend.py::TestSimBackend::test_local_mesh_setup'`,

(3) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/hyperactor:hyperactor-unittest -- --exact 'monarch/hyperactor:hyperactor-unittest - clock::tests::test_sim_timeout'`

in all cases i *think* the tests themselves are passing and the undeliverable messages are actually encountered during teardown.

Differential Revision: D78191921
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78191921

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78191921

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Jul 12, 2025
Summary:
Pull Request resolved: pytorch-labs#517

(1) pesky warnings in python about using a mailbox for sending without binding the undeliverable port (see D78031232) turned out to be due to the use of `SimClock`.

hack in a "fix" that ~~terminates the process~~ logs the event (the idea of terminating the process didn't work out - test failures[*] ) if an undeliverable message is encountered in this context.

we'll definitely want to revisit this however (kaiyuan-li, thomasywang) and fix it more appropriately (maybe by threading through a mailbox from upstack that has the undeliverable port bound and bounces undeliverables into supervision events perhaps?).

 ---

[*] the failed tests being,

(1) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/controller:controller-unittest -- --exact 'monarch/controller:controller-unittest - tests::test_sim_supervision_failure'`,

(2)`buck test 'fbcode//mode/opt' fbcode//monarch/python/tests:test_sim_backend -- --exact 'monarch/python/tests:test_sim_backend - test_sim_backend.py::TestSimBackend::test_local_mesh_setup'`,

(3) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/hyperactor:hyperactor-unittest -- --exact 'monarch/hyperactor:hyperactor-unittest - clock::tests::test_sim_timeout'`

in all cases i *think* the tests themselves are passing and the undeliverable messages are actually encountered during teardown.

Differential Revision: D78191921
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78191921

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Jul 12, 2025
Summary:
Pull Request resolved: pytorch-labs#517

(1) pesky warnings in python about using a mailbox for sending without binding the undeliverable port (see D78031232) turned out to be due to the use of `SimClock`.

hack in a "fix" that ~~terminates the process~~ logs the event (the idea of terminating the process didn't work out - test failures[*] ) if an undeliverable message is encountered in this context.

we'll definitely want to revisit this however (kaiyuan-li, thomasywang) and fix it more appropriately (maybe by threading through a mailbox from upstack that has the undeliverable port bound and bounces undeliverables into supervision events perhaps?).

 ---

[*] the failed tests being,

(1) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/controller:controller-unittest -- --exact 'monarch/controller:controller-unittest - tests::test_sim_supervision_failure'`,

(2)`buck test 'fbcode//mode/opt' fbcode//monarch/python/tests:test_sim_backend -- --exact 'monarch/python/tests:test_sim_backend - test_sim_backend.py::TestSimBackend::test_local_mesh_setup'`,

(3) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/hyperactor:hyperactor-unittest -- --exact 'monarch/hyperactor:hyperactor-unittest - clock::tests::test_sim_timeout'`

in all cases i *think* the tests themselves are passing and the undeliverable messages are actually encountered during teardown.

Differential Revision: D78191921
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78191921

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Jul 12, 2025
Summary:
Pull Request resolved: pytorch-labs#517

(1) pesky warnings in python about using a mailbox for sending without binding the undeliverable port (see D78031232) turned out to be due to the use of `SimClock`.

hack in a "fix" that ~~terminates the process~~ logs the event (the idea of terminating the process didn't work out - test failures[*] ) if an undeliverable message is encountered in this context.

we'll definitely want to revisit this however (kaiyuan-li, thomasywang) and fix it more appropriately (maybe by threading through a mailbox from upstack that has the undeliverable port bound and bounces undeliverables into supervision events perhaps?).

 ---

[*] the failed tests being,

(1) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/controller:controller-unittest -- --exact 'monarch/controller:controller-unittest - tests::test_sim_supervision_failure'`,

(2)`buck test 'fbcode//mode/opt' fbcode//monarch/python/tests:test_sim_backend -- --exact 'monarch/python/tests:test_sim_backend - test_sim_backend.py::TestSimBackend::test_local_mesh_setup'`,

(3) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/hyperactor:hyperactor-unittest -- --exact 'monarch/hyperactor:hyperactor-unittest - clock::tests::test_sim_timeout'`

in all cases i *think* the tests themselves are passing and the undeliverable messages are actually encountered during teardown.

Differential Revision: D78191921
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78191921

shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Jul 12, 2025
Summary:
Pull Request resolved: pytorch-labs#517

(1) pesky warnings in python about using a mailbox for sending without binding the undeliverable port (see D78031232) turned out to be due to the use of `SimClock`.

hack in a "fix" that ~~terminates the process~~ logs the event (the idea of terminating the process didn't work out - test failures[*] ) if an undeliverable message is encountered in this context.

we'll definitely want to revisit this however (kaiyuan-li, thomasywang) and fix it more appropriately (maybe by threading through a mailbox from upstack that has the undeliverable port bound and bounces undeliverables into supervision events perhaps?).

 ---

[*] the failed tests being,

(1) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/controller:controller-unittest -- --exact 'monarch/controller:controller-unittest - tests::test_sim_supervision_failure'`,

(2)`buck test 'fbcode//mode/opt' fbcode//monarch/python/tests:test_sim_backend -- --exact 'monarch/python/tests:test_sim_backend - test_sim_backend.py::TestSimBackend::test_local_mesh_setup'`,

(3) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/hyperactor:hyperactor-unittest -- --exact 'monarch/hyperactor:hyperactor-unittest - clock::tests::test_sim_timeout'`

in all cases i *think* the tests themselves are passing and the undeliverable messages are actually encountered during teardown.

Differential Revision: D78191921
@shayne-fletcher shayne-fletcher force-pushed the export-D78191921 branch 2 times, most recently from 34dff8a to 8500f86 Compare July 12, 2025 13:44
shayne-fletcher added a commit to shayne-fletcher/monarch-1 that referenced this pull request Jul 12, 2025
…#517)

Summary:

(1) pesky warnings in python about using a mailbox for sending without binding the undeliverable port (see D78031232) turned out to be due to the use of `SimClock`.

hack in a "fix" that ~~terminates the process~~ logs the event (the idea of terminating the process didn't work out - test failures[*] ) if an undeliverable message is encountered in this context.

we'll definitely want to revisit this however (kaiyuan-li, thomasywang) and fix it more appropriately (maybe by threading through a mailbox from upstack that has the undeliverable port bound and bounces undeliverables into supervision events perhaps?).

---


[*] the failed tests being,

(1) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/controller:controller-unittest -- --exact 'monarch/controller:controller-unittest - tests::test_sim_supervision_failure'`, 

(2)`buck test 'fbcode//mode/opt' fbcode//monarch/python/tests:test_sim_backend -- --exact 'monarch/python/tests:test_sim_backend - test_sim_backend.py::TestSimBackend::test_local_mesh_setup'`,

(3) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/hyperactor:hyperactor-unittest -- --exact 'monarch/hyperactor:hyperactor-unittest - clock::tests::test_sim_timeout'`

in all cases i *think* the tests themselves are passing and the undeliverable messages are actually encountered during teardown.

Differential Revision: D78191921
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78191921

…#517)

Summary:

(1) pesky warnings in python about using a mailbox for sending without binding the undeliverable port (see D78031232) turned out to be due to the use of `SimClock`.

hack in a "fix" that ~~terminates the process~~ logs the event (the idea of terminating the process didn't work out - test failures[*] ) if an undeliverable message is encountered in this context.

we'll definitely want to revisit this however (kaiyuan-li, thomasywang) and fix it more appropriately (maybe by threading through a mailbox from upstack that has the undeliverable port bound and bounces undeliverables into supervision events perhaps?).

---


[*] the failed tests being,

(1) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/controller:controller-unittest -- --exact 'monarch/controller:controller-unittest - tests::test_sim_supervision_failure'`, 

(2)`buck test 'fbcode//mode/opt' fbcode//monarch/python/tests:test_sim_backend -- --exact 'monarch/python/tests:test_sim_backend - test_sim_backend.py::TestSimBackend::test_local_mesh_setup'`,

(3) `buck test 'fbcode//mode/dev-nosan' fbcode//monarch/hyperactor:hyperactor-unittest -- --exact 'monarch/hyperactor:hyperactor-unittest - clock::tests::test_sim_timeout'`

in all cases i *think* the tests themselves are passing and the undeliverable messages are actually encountered during teardown.

Differential Revision: D78191921
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D78191921

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants